Article 5117

Title of the article

MACHINE LEARNING FOR THE OBJECTIVE OF MORPHOLOGICAL TAGGING AND POS
DETERMINATION IN INFLEXIONAL LANGUAGES 

Authors

Tarasov Dmitriy Viktorovich, Candidate of engineering sciences, associate professor, sub-department of higher and applied mathematics, Penza State University (40, Krasnaya street, Penza, Russia), tarasovdv@mail.ru
Romanov Nikita Alekseevich, Master’s degree student, Lomonosov Moscow State University (1 Leninskie gory street, Moscow, Russia), nikromanov1995@gmail.com

Index UDK

 004.9

DOI

 10.21685/2072-3059–2017-1-5

Abstract

Background. The computer analysis of texts with one of its stages being POStagging is required for multiple routine objectives in many IT-fields (such as website promotion). However, it is much harder to execute morphological tagging for Russian language, than for English language. Good libraries capable of realizing such potential are either slow or require additional analysis. The aim of the work is to implement methods of Russian language text recognition.
Materials and methods. Recognition methods were implemented through the method of support vector machines with regard to problems of object classification and using the Russian National Corpus – SinTagRus. Software implementation of machine learning was carried out via С/С++.
Results. The study has offered a classification algorithm for text analysis and POS determination in Russian language texts of various topics. The given text processing algorithm requires high representative sampling for high-quality analysis. The authors have offered an effective parameter (tag) selection scheme for learning procedure establishment.
Conclusions. The machine learning procedure has demonstrated the efficiency of about 87–95 % when analyzing POS in senteces of various topical areas (by the example of Russian language). It cand be used in computer analysis of tets for IT purposes.

Key words

support vector machine, machine learning procedure, linear classifiers, Karush–Kuhn–Tucker conditions.

Download PDF
References

1. Flektivnye yazyki [Inflexional languages]. Available at: http://dic.academic.ru/ dic.nsf/bse/ 145088/%D0 %A4%D0%BB%D0%B5%D0%BA%D1%82%D0%B8%D0%B2%D0%BD%D1%8B%D0%B5
2. Fletcher R. Practical methods of optimization. 2nd edition. New York: John Wiley and Sons, Inc.,1987, 436p.
3. McCormick G. P. Non Linear Programming: Theory, Algorithms and Applications. New York: John Wiley and Sons, Inc., 1983, 444 p.
4. Christopher J. C. Data Mining and Knowledge Discovery. 1998, vol. 2, pp. 121–167.
5. Vorontsov K. V. Lektsii po metodu opornykh vektorov [Lectures on support vector machines]. Available at: http://www.ccas.ru/voron/download/SVM.pdf
6. Natsional'nyy korpus russkogo yazyka [The Russian National Corpus]. Available at: http://www. ruscorpora.ru
7. Crammer K., Singer Y. On the learnability and design of output codes for multiclass problems. Computational Learning Theory. – San Francisco, 2000, pp. 35–46.

 

Дата создания: 08.08.2017 15:45
Дата обновления: 09.08.2017 15:36